Premier League Analysis

The English Premier League is the top level of competition in English football. It is widely regarded as one of the most competitive and is one of the most watched sports competitions in the world.

I found a dataset which contains these data for each season of the premier league:

I have downloaded the data for 10 seasons from 2011/2012 till 2020/2021 and I plan on making my project with this data.

Data Source: https://www.football-data.co.uk/englandm.php

In this project I answer some questions like:

This project consists of 3 phases

Phase 1: Data Wrangling

Phase 2: Data Analysis

Phase 3: Machine Learning

Phase 1: Data Wrangling

First let's start by looking at the data for 2020/2021 season

In my analysis I will not be looking into betting odds so I will start by dropping all the betting columns

The "Div" column has no importance in my analysis so I will drop this column

I will add a "Year" column to differentiate between different seasons

From the cell below we can see that there is no null values in our data

Now I will create a function to calculate the end of season table

Function inputs: Name of teams, season dataframe, year

Function output: DataFrame containing the end of season table

Now I will create a function to calculate the points of each team after each match

Function inputs: Name of teams, season dataframe, year

Function output: DataFrame containing the points of each team after each match

Working With All 10 Seasons

After working with 1 dataset, I have made the first assumptions and made some functions that I will use during my work on the 10 datasets together

I've noticed that some of the seasons files doesn't contain a "Time" column, so I will add that column with a string saying "Not Available"

Now I will create a function to drop the betting columns from all 10 datasets and add a "Year" column in each dataset and add "Time" column in the datasets which doesn't have the "Time" column

Function inputs: List of Datasets

Function output: None

Now I will create a function to group the matches for all 10 cleaned datasets

Function inputs: List of Datasets

Function output: DataFrame containing the matches for all 10 cleaned datasets

Now I will create a function to group the names of teams for each season for all 10 cleaned datasets

Function inputs: List of Datasets

Function output: DataFrame containing the names of teams for each season for all 10 cleaned datasets

Now I will create a function to group the end-of-year table each season for all 10 cleaned datasets

Function inputs: List of Datasets, List of team names for each season

Function output: DataFrame containing the end-of-year table each season for all 10 cleaned datasets

Now I will create a function to group the points after each match for all 10 cleaned datasets

Function inputs: List of Datasets, List of team names for each season

Function output: DataFrame containing the points after each match for all 10 cleaned datasets

Phase 2: Data Analysis

In This section I work on the 3 dataset from the last part

1- totalEndOfSeasonTables
2- totalmatchesDf
3- totalMatchPointsOfEverySeason

1- totalEndOfSeasonTables

First let's see the number of all teams who played through the 10 seasons

Now let's look at the total points each team got through the years

We look at the winning teams for the last 10 years

Now let's look at the positions of the teams by end of the year for all the 10 years

Now we look at the teams that qualified to the champions league

Now let's see which teams stayed in the premier league for all 10 seasons

2- totalmatchesDf

First I start by taking a general look at the data and getting some insights

Now let's take a look at the refrees

Now we look at cards given to home vs away teams

Now we look at the comebacks

I define a comeback as: a team was loosing by half time, then in the second half the team made it a draw or won the match

Now we look at some statistics about the shots

Now we look at some team statistics through the seasons

Now we look at which features contribute more to winning a match

I start by looking at the probability of a team winning having more shots, shots on target, corners, fouls, less cards. Then I turn the results into a percentage and group them into a graph

Now I draw a graph with the contribution of each feature into winning a match

3- totalMatchPointsOfEverySeason

In this section I look at the performance of each of the teams that stayed in the premier league for all the seasons

I draw a graph for each team showing the number of points the team got after each match for all the seasons